ProjecTILs case study - MC38 TILs

In this case study, we will build an integrated scRNA-seq analysis workflow to interpret the transcriptional and clonal structure of tumor-infiltrating T cells in MC38 colon adenocarcinoma (data from Xiong et al 2019).

The main R packages and methods employed in this workflow are:

Note that scRepertoire requires R version 4.0 or higher - you will need to have R v.4 installed to run this case study.

R Environment

Check & load R packages

Sys.setenv(R_REMOTES_NO_ERRORS_FROM_WARNINGS = "true")

library(renv)
renv::restore()

library(gridExtra)
library(ggplot2)
library(plotly)
library(ProjecTILs)
library(scRepertoire)

scRNA-seq data preparation

Download scRNA-seq data from Array Express (https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-7919/) After download and unpacking (you will need curl and unzip), you should get three files: matrix.mtx, genes.tsv and barcodes.tsv

files <- c("E-MTAB-7919.processed.1.zip", "E-MTAB-7919.processed.2.zip", "E-MTAB-7919.processed.3.zip")
matrix_dir <- "./input/Xiong_TIL/matrix"
system(sprintf("mkdir -p %s", matrix_dir))

for (i in 1:length(files)) {
    data_path <- sprintf("https://www.ebi.ac.uk/arrayexpress/files/E-MTAB-7919/%s",
        files[i])
    system(sprintf("curl %s --output %s/data.zip", data_path, matrix_dir))
    system(sprintf("unzip -o %s/data.zip -d %s", matrix_dir, matrix_dir))
}
system(sprintf("rm %s/data.zip", matrix_dir))

Load scRNA-seq data and store as Seurat object

projectID <- "Xiong_TIL"
libIDtoSampleID <- c("Mouse 1", "Mouse 2", "Mouse 3", "Mouse 4")
names(libIDtoSampleID) <- 4:7

exp_mat <- Read10X(matrix_dir)
querydata <- CreateSeuratObject(counts = exp_mat, project = projectID, min.cells = 3,
    min.features = 50)
querydata$Sample <- substring(colnames(querydata), 18)
table(querydata$Sample)

   4    5    6    7 
1759 1746 1291 1386 
querydata$SampleLabel <- factor(querydata$Sample, levels = c(4:7), labels = libIDtoSampleID)
table(querydata$SampleLabel)

Mouse 1 Mouse 2 Mouse 3 Mouse 4 
   1759    1746    1291    1386 

scTCR data preparation

Download scTCR data from Array Express (https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-7918/)

data_path <- "https://www.ebi.ac.uk/arrayexpress/files/E-MTAB-7918/E-MTAB-7918.processed.1.zip"
tcr_dir <- "./input/Xiong_TIL/TCR"

system(sprintf("mkdir -p %s", tcr_dir))
system(sprintf("curl %s --output %s/data.zip", data_path, tcr_dir))
system(sprintf("unzip -o %s/data.zip -d %s", tcr_dir, tcr_dir))

Mouse 1 to 4 (sample ID 4 to 7) correspond to TCR-seq libraries 35 to 38

libIDtoSampleID_VDJ <- 4:7
names(libIDtoSampleID_VDJ) <- 35:38

vdj.list <- list()
for (i in 1:length(libIDtoSampleID_VDJ)) {
    s <- names(libIDtoSampleID_VDJ)[i]
    vdj.list[[i]] <- read.csv(sprintf("%s/filtered_contig_annotations_%s.csv", tcr_dir,
        s), as.is = T)

    # Rename barcodes to match scRNA-seq suffixes
    vdj.list[[i]]$barcode <- sub("\\d$", "", vdj.list[[i]]$barcode)
    vdj.list[[i]]$barcode <- paste0(vdj.list[[i]]$barcode, libIDtoSampleID_VDJ[i])
    vdj.list[[i]]$raw_clonotype_id <- paste0(vdj.list[[i]]$raw_clonotype_id, "-",
        libIDtoSampleID_VDJ[i])
    vdj.list[[i]]$SampleLabel <- libIDtoSampleID_VDJ[i]

}

Combine alpha and beta chains using the combineTCR function from scRepertoire

# Using parameters removeNA=T and removeMulti=T will remove cells with multiple
# a-b combinations
combined <- combineTCR(vdj.list, samples = libIDtoSampleID_VDJ, ID = names(libIDtoSampleID_VDJ),
    cells = "T-AB", removeNA = T, removeMulti = T)

for (i in seq_along(combined)) {
    combined[[i]] <- stripBarcode(combined[[i]], column = 1, connector = "_", num_connects = 3)
}

The function ‘combineExpression’ of scRepertoire allows incorporating clonotype information to a Seurat object, and creates the categorical variable ‘cloneTypes’ discretizing frequencies of the clonotypes

querydata <- combineExpression(combined, querydata, cloneCall = "gene", groupBy = "none")

We have now paired expression and TCR data for the query samples, and loaded them into a unified Seurat object. We can proceed to project the data onto the reference atlas.

ProjecTILs

Load the reference atlas

ref <- load.reference.map()
[1] "Loading Default Reference Atlas..."
[1] "/Users/mass/Documents/Projects/Github/ProjecTILs_CaseStudies/ref_TILAtlas_mouse_v1.rds"
[1] "Loaded Reference map ref_TILAtlas_mouse_v1"

Project query data (with clonotype information stored as metadata) onto the TIL reference atlas

query.projected <- make.projection(querydata, ref = ref, ncores = 2)
[1] "Using assay RNA for query"
[1] "135 out of 6182 ( 2% ) non-pure cells removed. Use filter.cells=FALSE to avoid pre-filtering (NOT RECOMMENDED)"
[1] "Aligning query to reference map for batch-correction..."

Projecting corrected query onto Reference PCA space

Projecting corrected query onto Reference UMAP space

Visualization of projected data.

T cells are projected mostly in the Treg, CD8 Terminal exhausted (CD8_Tex) and Precursor exhausted (CD8_Tpex) areas, with some clusters in the T-helper (Th1) and CD8 Effector memory areas, and to a lesser extent in Naive-like and CD8 Early-Activated areas.

p1 <- plot.projection(ref)
p2 <- plot.projection(ref, query.projected, linesize = 0.5, pointsize = 0.5)
grid.arrange(p1, p2, ncol = 2)

Visualize the projections per sample. Broadly, the distribution across the atlas is similar for the four mice, with some variation in the frequency of Effector Memory T cells.

plots <- list()

sample_names <- unique(query.projected$SampleLabel)
for (sample_i in seq_along(sample_names)) {
    sample <- sample_names[sample_i]
    plots[[sample_i]] <- plot.projection(ref, query.projected[, query.projected$SampleLabel ==
        sample]) + ggtitle(sample)
}

grid.arrange(grobs = plots, ncol = 2)

Classify projected T cells into cell subtypes/states

query.projected <- cellstate.predict(ref = ref, query = query.projected)
table(query.projected$functional.cluster)  #Cell state assignment is stored in the 'functional.cluster' metadata field

     CD4_NaiveLike     CD8_EarlyActiv CD8_EffectorMemory      CD8_NaiveLike 
                31                161                622                356 
           CD8_Tex           CD8_Tpex                Tfh                Th1 
              1806                440                  8                554 
              Treg 
              2069 

Look at distribution of T cells in terms of cell states.

plot.statepred.composition(ref, query.projected, metric = "Percent")

We can check the gene expression profile of the cells assigned to each state (yellow), and compare them to those of the reference states (black).

plot.states.radar(ref = ref, query.projected, min.cells = 30)

For example, cells projected into Tex and Tpex region express Tox and Pdcd1, while only Tex express Gzmb and Havcr2, as expected.

Clonality analysis

Now, let’s see where the expanded clones are located within the map:

levs <- c("Hyperexpanded (100 < X <= 500)", "Large (20 < X <= 100)", "Medium (5 < X <= 20)",
    "Small (1 < X <= 5)", "Single (0 < X <= 1)", NA)
palette <- colorRampPalette(c("#FF4B20", "#FFB433", "#C6FDEC", "#7AC5FF", "#0348A6"))

query.projected$cloneType <- factor(query.projected$cloneType, levels = levs)
DimPlot(query.projected, group.by = "cloneType") + scale_color_manual(values = c(palette(5)),
    na.value = "grey")

Larger (most highly expanded) clones are concentrated in the Tex/Tpex area. This is largely explained by the fact that sustained antigenic stimulation drives immunodominant clones into the Tox+-driven exhausted lineage (Tex - Tpex region)

Let’s check the raw numbers

table(query.projected@meta.data$cloneType, query.projected@meta.data$functional.cluster)
                                
                                 CD4_NaiveLike CD8_EarlyActiv
  Hyperexpanded (100 < X <= 500)             0              0
  Large (20 < X <= 100)                      0              1
  Medium (5 < X <= 20)                       0              2
  Small (1 < X <= 5)                         0              6
  Single (0 < X <= 1)                       23             64
                                
                                 CD8_EffectorMemory CD8_NaiveLike CD8_Tex
  Hyperexpanded (100 < X <= 500)                  1             0      86
  Large (20 < X <= 100)                          99             0     413
  Medium (5 < X <= 20)                           46             1     318
  Small (1 < X <= 5)                             98            10     222
  Single (0 < X <= 1)                           140           162     143
                                
                                 CD8_Tpex Tfh Th1 Treg
  Hyperexpanded (100 < X <= 500)       31   0   0    3
  Large (20 < X <= 100)                97   0   0    8
  Medium (5 < X <= 20)                 69   0  26  138
  Small (1 < X <= 5)                   56   2 118  242
  Single (0 < X <= 1)                  32   1 172  307

Plot clonal size by functional cluster

meta <- melt(table(query.projected@meta.data[!is.na(query.projected@meta.data$Frequency),
    c("functional.cluster", "cloneType")]), varnames = c("functional.cluster", "cloneType"))
ggplot(data = meta, aes(x = functional.cluster, y = value, fill = cloneType)) + geom_bar(stat = "identity") +
    theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) + scale_fill_manual(values = c(palette(5)),
    na.value = "grey")

This confirms the intial observation that Tex and Tpex present the largest clonal expansion

We can highlight specific clonotypes on the reference atlas (here those with at least 40 cells):

clone_call = "CTaa"  #select column/variable to use as clonotypes ID, in this case CTaa, the paired TCR CDR3 aminoacid sequences
cutoff <- 35  #Min cells for clonotype

clonotypeSizes <- sort(table(query.projected[[clone_call]])[table(query.projected[[clone_call]]) >
    cutoff], decreasing = T)

bigClonotypes <- names(clonotypeSizes)

plots <- list()

for (i in 1:length(bigClonotypes)) {
    ctype <- bigClonotypes[i]

    plots[[i]] <- plot.projection(ref, query.projected[, which(query.projected[[clone_call]] ==
        ctype)]) + ggtitle(sprintf("%s - size %i", ctype, clonotypeSizes[ctype]))
}

grid.arrange(grobs = plots, ncol = 2)

The majority of clones tend to span Tex and Tpex states. Indeed, Tcf7+ precursor exhausted Tpex cells self-renew and give rise to more differentiated Tex effector cells.

The clonal overlap (Morisita similarity index) implemented in scRepertoire confirms this pattern:

meta.list <- expression2List(query.projected, group = "functional.cluster")
clonalOverlap(meta.list, cloneCall = "gene", method = "morisita") + theme(axis.text.x = element_text(angle = 90,
    hjust = 1, vjust = 0.5))

We can also visualize overlap of the top CD8 clones, and their overlap across functional clusters, using scRepertoire’s alluvial plot

compareClonotypes(meta.list, numbers = 6, samples = c("CD8_EffectorMemory", "CD8_Tpex",
    "CD8_Tex", "CD8_NaiveLike", "CD8_EarlyActiv"), cloneCall = "aa", graph = "alluvial") +
    theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))

Conclusions

Projection of the MC38 TILs onto the reference atlas showed that these T cell samples mostly consist of exhausted (CD8_Tex), precursor-exhausted CD8 T cells (CD8_Tpex), and Tregs, typical of highly immunogenic, “hot” tumors. Also, the majority of expanded clones were found spanning the CD8_Tex and CD8_Tpex states. This is expected as sustained antigen stimulation in the tumor drives immuno-dominant tumor-reactive clones towards the exhaustion differentiation path.
The combination of ProjecTILs and scRepertoire simplifies the joint analysis of single-cell expression data and clonotype analysis, in the context of an annotated reference atlas of TIL states.

Further reading

Original publication - Xiong et al. (2019) Cancer Immunol Res

ProjecTILs case studies - INDEX - Repository

The ProjecTILs method Andreatta et. al (2021) Nat. Comm. and code